GradQuant: Multilevel and Mixed-Effects Modeling

Introduction

Hierarchical data structure

  • Students at first-level, schools at second-level, districts at third-level

  • Can also refer to cients in therapists, classrooms in schools, employees in organization, people in family, measurements in person, citizens in country

Questions

  1. What is another name for multilevel models?

  2. In a data structure where data is collected for people multiple times, what is the level-1 unit? What is the level-2 unit?

Answers

  1. Mixed-effects models

  2. Measurement occassions; People

Read Data

Let’s load a dataset from Ch. 3 of Heck, R. H., Thomas, S. L., & Tabata, L. N. (2011). Multilevel and Longitudinal Modeling with IBM SPSS: Taylor & Francis.

Datasets can be pretty large for multilevel data so I generally favor more efficient data reading functions like read_csv or fread

df <- data.table::fread("~/Documents/GitHub/MLM_GQ-F2022-WS/heck2011.csv")

Inspect Data

Let’s inspect the data. One thing that is immediately apparent is that “schcode” for school is repeated for multiple observations whereas each “id” for student is distinct/unique. This shows that there are repeated observations of students within each school.

head(df, n=50)
    schcode Rid   id female    ses femses    math   ses_mean    pro4yrc public
 1:       1   1 6701      1  0.586  0.586 47.1400 -0.2667500 0.08333333      0
 2:       1   2 6702      1  0.304  0.304 63.6100 -0.2667500 0.08333333      0
 3:       1   3 6703      1 -0.544 -0.544 57.7100 -0.2667500 0.08333333      0
 4:       1   4 6704      0 -0.848  0.000 53.9000 -0.2667500 0.08333333      0
 5:       1   5 6705      0  0.001  0.000 58.0100 -0.2667500 0.08333333      0
 6:       1   6 6706      0 -0.106  0.000 59.8700 -0.2667500 0.08333333      0
 7:       1   7 6707      0 -0.330  0.000 62.5556 -0.2667500 0.08333333      0
 8:       1   8 6708      1 -0.891 -0.891 47.0100 -0.2667500 0.08333333      0
 9:       1   9 6709      0  0.207  0.000 72.4200 -0.2667500 0.08333333      0
10:       1  10 6710      1 -0.341 -0.341 65.8400 -0.2667500 0.08333333      0
11:       1  11 6711      0 -0.171  0.000 57.3400 -0.2667500 0.08333333      0
12:       1  12 6712      1 -1.068 -1.068 62.5556 -0.2667500 0.08333333      0
13:       2   1 3703      0 -0.105  0.000 61.9528  0.6799231 1.00000000      0
14:       2   2 3704      0  1.280  0.000 70.2200  0.6799231 1.00000000      0
15:       2   3 3705      0  1.056  0.000 58.7800  0.6799231 1.00000000      0
16:       2   4 3706      0  0.800  0.000 65.5400  0.6799231 1.00000000      0
17:       2   5 3707      0  0.733  0.000 59.7700  0.6799231 1.00000000      0
18:       2   6 3708      0  0.130  0.000 64.0700  0.6799231 1.00000000      0
19:       2   7 3709      0  0.682  0.000 61.9528  0.6799231 1.00000000      0
20:       2   8 3710      0  0.917  0.000 66.8300  0.6799231 1.00000000      0
21:       2   9 3711      0  0.810  0.000 69.4100  0.6799231 1.00000000      0
22:       2  10 3712      0  0.518  0.000 56.5100  0.6799231 1.00000000      0
23:       2  11 3713      0  0.464  0.000 57.2200  0.6799231 1.00000000      0
24:       2  12 3714      0  0.552  0.000 64.8900  0.6799231 1.00000000      0
25:       2  13 3715      0  1.002  0.000 70.1200  0.6799231 1.00000000      0
26:       3   1  861      1 -0.302 -0.302 48.7600 -0.5480000 0.33333333      0
27:       3   2  862      1 -0.648 -0.648 44.6500 -0.5480000 0.33333333      0
28:       3   3  863      1 -0.448 -0.448 60.5929 -0.5480000 0.33333333      0
29:       3   4  864      1 -0.096 -0.096 44.9900 -0.5480000 0.33333333      0
30:       3   5  865      1 -0.291 -0.291 50.9000 -0.5480000 0.33333333      0
31:       3   6  866      0 -1.650  0.000 42.9400 -0.5480000 0.33333333      0
32:       3   7  867      0 -1.125  0.000 50.6700 -0.5480000 0.33333333      0
33:       3   8  868      1 -1.872 -1.872 39.6700 -0.5480000 0.33333333      0
34:       3   9  869      1 -0.602 -0.602 37.8000 -0.5480000 0.33333333      0
35:       3  10  870      0 -0.922  0.000 44.1000 -0.5480000 0.33333333      0
36:       3  11  871      1  0.228  0.228 57.2300 -0.5480000 0.33333333      0
37:       3  12  872      1  0.401  0.401 40.9700 -0.5480000 0.33333333      0
38:       3  13  873      0 -0.119  0.000 37.6200 -0.5480000 0.33333333      0
39:       3  14  874      0 -0.450  0.000 60.5929 -0.5480000 0.33333333      0
40:       3  15  875      1 -0.246 -0.246 52.2500 -0.5480000 0.33333333      0
41:       3  16  876      0 -0.868  0.000 60.5929 -0.5480000 0.33333333      0
42:       3  17  877      1 -0.986 -0.986 41.6000 -0.5480000 0.33333333      0
43:       3  18  878      0  0.132  0.000 40.3400 -0.5480000 0.33333333      0
44:       4   1 6799      1  0.966  0.966 62.7079  0.8159412 1.00000000      0
45:       4   2 6800      1  0.823  0.823 62.7079  0.8159412 1.00000000      0
46:       4   3 6801      1 -0.212 -0.212 62.7079  0.8159412 1.00000000      0
47:       4   4 6802      1  0.895  0.895 62.7079  0.8159412 1.00000000      0
48:       4   5 6803      1  1.065  1.065 69.0400  0.8159412 1.00000000      0
49:       4   6 6804      1  1.251  1.251 78.9800  0.8159412 1.00000000      0
50:       4   7 6805      1  0.761  0.761 62.7079  0.8159412 1.00000000      0
    schcode Rid   id female    ses femses    math   ses_mean    pro4yrc public

General Linear Model

  • Assumes residuals are uncorrelated

  • Observations are assumed as independent

  • Effects are independent

    Formula: \(math_{ij}=β_{0}+β_{1}X_{i}+\epsilon_{i}\)

Ordinary Least Squares Model

OLS <- lm(math ~ ses, data = df)
summary(OLS)

Call:
lm(formula = math ~ ses, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.459  -4.678   1.144   5.355  47.560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 57.59817    0.09819  586.61   <2e-16 ***
ses          4.25468    0.12566   33.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.132 on 6869 degrees of freedom
Multiple R-squared:  0.143, Adjusted R-squared:  0.1429 
F-statistic:  1146 on 1 and 6869 DF,  p-value: < 2.2e-16

General Linear Model (continued)

So those t-statistics and p-values suggest that SES has an effect that should not be observed given that the null is true, right? Let’s see if the effect differs between girls and boys.

OLS_Multiple <- lm(math ~ ses * female, data = df)
summary(OLS_Multiple)

Call:
lm(formula = math ~ ses * female, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-30.975  -4.596   1.172   5.288  48.265 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  58.1441     0.1393 417.506  < 2e-16 ***
ses           4.0533     0.1775  22.840  < 2e-16 ***
female       -1.0737     0.1961  -5.475 4.54e-08 ***
ses:female    0.3472     0.2510   1.383    0.167    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.115 on 6867 degrees of freedom
Multiple R-squared:  0.1469,    Adjusted R-squared:  0.1465 
F-statistic: 394.2 on 3 and 6867 DF,  p-value: < 2.2e-16
sjPlot::plot_model(OLS_Multiple, type = "pred", terms = c("ses", "female"))

Ordinary Least Squares Fixed Effects Model Plotted

There’s not much of a difference in the effect of SES between boys and girls.

df %>% 
  ggplot(mapping = aes(x = ses, y = math )) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, fullrange = TRUE)

Can you identify an issues with modeling this data using the previous approach?

We treat all schools as if they have the same slopes and intercepts, which surely all schools do not have the same average math achievement score or effect of SES on math achievement, right?

This is the problem of non-independence: Some schools may have higher/lower math achievements, or the association of gender with math achievement may differ between schools.

We can use a neat package called performance to inspect the GLM assumptions, as well.

# install.packages(performance)
performance::check_model(OLS_Multiple)

All Slopes and Intercepts Plotted by Schools

df %>% 
  ggplot(mapping = aes(x = ses, y = math, colour = factor(schcode))) +
  geom_point() +
  geom_smooth(mapping = aes(group = schcode), method = "lm", se = FALSE, fullrange = TRUE) +
  labs(colour = "schcode") +
  theme(legend.position="none")

Simpson’s Paradox

“a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined.”

Nonindependence

Clustering of individual data points nested within higher units. OLS regression assumes observations are independently distributed

Standard Errors

Ignoring hierarchical structure and examining data as disaggregated will lead to standard errors being underestimated, overestimation of statistical significance.

Why are standard errors biased downwards by disaggregating?

  • Using the total sample size assumes that each individual contributes unique information
  • This is not the case with clustering, so the denominator (N) will be too large
    • If I know Timmy’s reading score and know that Tommy is in the same school as Timmy, I already know a little bit about Tommy’s score too

    • The unique contribution of each student is reduced because some information is shared by knowing which school they are in

Why is this a problem?

Inflated risk of Type 1 Error! We think SE is smaller than it is if we assume observations are independent and that it’s 100 cases, instead of 10 cases of 10…

\(t = \frac{b}{SE}\)

\(SE=\frac{\sigma}{\sqrt{N}}\)

What does non-independence refer to and why is it a problem?

Theoretical/Conceptual Interest in Group-Level Effects

  • Perhaps you are interested in the effect of school characteristics on student academic performance…
    • Or the individual’s personality on changes in cognitive control across years…
  • Such questions would require a multilevel approach, whereby a level-1 outcome is regressed onto a level-2 predictor.
  • This could be taken a step further, examining cross-level interactions.
    • How does the socioeconomic status of the student interact with the resources of the school to predict student academic performance?

General Linear Mixed Model

  • Residuals may be correlated
  • Observations not assumed independent
  • Random effects model additional sources of variation

Why MIXED?

Fixed Effects Random Effects
  • Population level (i.e., average) effects that should persist across studies/samples
  • Condition/treatment/intervention effect are typically fixed as they operate in “average” or “typical” ways that are predictable
  • Think of it as the typical effects
  • Trends vary across levels of some grouping factor (e.g., participants or items)
  • Clusters of dependent datapoints in which the component observations come from the same higher-level group (e.g., an individual participant or item)
  • Account for the fact that observations may behave differently from the average trend
  • Think of it as the deviation/difference from the typical effects

What is “mixed” in a mixed model?

How are fixed and random effects different?

Modeling Random Effects

Do you expect there to be variation in the intercept (i.e., average) between schools?

Do you expect there to be variation in the effect of X on Y (i.e., slopes) between schools?

Random Effects in Colloquial Terms

  • “Random effects” == “by-school varying intercepts and slopes”

    • In estimating the random effects, you allow intercepts and/or slopes to vary
  • Do we expect the schools to have different averages in math achievement? Then, we should model a random-intercept model.

    • Fixed effects intercept estimates the average intercept per school, while the random-intercept estimates the deviation of each school from that average.
  • Do we expect the schools to differ in their effects of gender on math achievement? Then, we should model a random-slope model.

    • Fixed effects slope estimates the average effect of gender on math achievement per school, while the random-slope estimates the deviation of that effect from average across schools.

When or why do we need a random intercept? When or why do we need a random slope?

Implementing Mixed Models

Load Mixed Model Package

lme4 is likely the most popular and widely used package for mixed-effects modeling in R. lmerTest estimates the degrees of freedom for each parameter in order to conduct hypothesis/significance testing.

library(lme4)
library(lmerTest)

Null Model

We can begin with a null model, sometimes called the intercept-only model. In this model, we simply model the DV with no regressors. We’re estimating residual variance for Level-1 and Level-2, but no predictors.

Level 1: \(math_{ij}=β_{0j}+R_{ij}\)

Level 2: \(β_{0j}=γ_{00}+U_{0j}\)

Combined: \(math_{ij}=γ_{00}+U_{0j}+R_{ij}\)

The elements in the parentheses represent the random effects. On right side of the vertical line is the random factor(s) (i.e. schools). On the left side are the random slopes for the fixed effects (i.e. random variation in the effect of each subject’s slope). Outside of parentheses are the fixed effects. This is a null model so you have a 1 to denote no fixed effects besides an intercept.

The random intercept allows us to account for variability in participant averages.

GLMM.Null <- lmer(math ~ 1 + (1 | schcode), data = df)

If you see above under random effects, you’ll notice that there are random effects statistics reported for both schools and residuals. You can understand these random effects as reflecting the between-school variance and the school-subjects variance respectively.

summary(GLMM.Null)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + (1 | schcode)
   Data: df

REML criterion at convergence: 48877.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6336 -0.5732  0.1921  0.6115  5.2989 

Random effects:
 Groups   Name        Variance Std.Dev.
 schcode  (Intercept) 10.64    3.262   
 Residual             66.55    8.158   
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  57.6742     0.1883 416.0655   306.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How much clustering do we have?

ICCs can tell us if a mixed model is justified. It is the quotient of the variance between clusters to the total variance. Can be interpreted as (1) the proportion of variance in the outcome that is attributable to clusters or (2) the expected correlation between the outcome from randomly selected units from a randomly selected cluster.

# install.packages("performance")
performance::icc(GLMM.Null)
# Intraclass Correlation Coefficient

    Adjusted ICC: 0.138
  Unadjusted ICC: 0.138

Calculate ICC manually

Level 2 Variance (i.e. Residual Variance) / (Total Variance i.e. [Random Intercept Variance + Residual Variance])

\(\frac{\tau^2_{0} }{\tau^2_{0} + \sigma^2}\)

vars <- as.data.frame(VarCorr(GLMM.Null,comp="Variance"))[1:2,4]
vars
[1] 10.64221 66.55066
vars[1] / sum(vars)
[1] 0.1378652

What does an intraclass correlation coefficient tell us (in the context of MLMs)?

Level 1 Predictor with Varying Intercepts, Fixed Slopes

We can move to the next step and model the effect of student-level SES on student-level math achievement, allowing the intercept to vary between schools. This means that each school is allowed to have a different mean for math achievement for SES = 0.

GLMM.L1SES <- lmer(math ~ ses + (1 | schcode), data = df)
summary(GLMM.L1SES)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode)
   Data: df

REML criterion at convergence: 48215.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7733 -0.5540  0.1303  0.6469  5.6908 

Random effects:
 Groups   Name        Variance Std.Dev.
 schcode  (Intercept)  3.469   1.863   
 Residual             62.807   7.925   
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   57.5960     0.1329  375.6989  433.36   <2e-16 ***
ses            3.8739     0.1366 3914.6382   28.35   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
ses -0.025

Is the fixed effect of SES with varying-intercepts supported?

We can perform a Likelihood Ratio Test (LRT) to test whether a model with an effect outperforms a model without an effect, otherwise called “nested models”. This compares the likelihood of the data under each model, hence the name. A small p-value, or higher Chi-Square, lower deviance, BIC, or AIC will all reflect that the full model performs better than the simpler model.

anova(GLMM.Null, GLMM.L1SES)
Data: df
Models:
GLMM.Null: math ~ 1 + (1 | schcode)
GLMM.L1SES: math ~ ses + (1 | schcode)
           npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
GLMM.Null     3 48882 48902 -24438    48876                         
GLMM.L1SES    4 48219 48246 -24106    48211 664.66  1  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What proportion of L1 variance explained by additional parameters?

If we divide the difference between our null model’s level-1 variance (i.e., residual variance) and this new model’s (l1) level-1 variance by the null model variance, we can see what proportion of variance was reduced.

N <- sigma(GLMM.Null)^2
L1 <- sigma(GLMM.L1SES)^2

prop <- (N - L1) / N
paste0("We reduced about ", round(prop,3), " of level 1 variance")
[1] "We reduced about 0.056 of level 1 variance"

Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)

Level 2: \(β_{0j}=γ_{00}+U_{0j}\) & \(β_{1j}=γ_{10}\)

Combined: \(math_{ij}=γ_{00}+γ_{10}SES_{ij}+U_{0j}+R_{ij}\)

  1. the fixed effect for the intercept, controlling for ses; \(γ_{00}\)

  2. the fixed effect for the slope of ses; \(γ_{10}\)

  3. a random effect for the intercept capturing the variance of schools around the intercept \(\tau^{2}_{0}\), controlling for ses. Every school has residuals around the intercept \(U_{0J}\)

  4. a random effect capturing the variance of students around their school mean math achievement, controlling for ses. \(\sigma^2\)

summary(GLMM.L1SES)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode)
   Data: df

REML criterion at convergence: 48215.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7733 -0.5540  0.1303  0.6469  5.6908 

Random effects:
 Groups   Name        Variance Std.Dev.
 schcode  (Intercept)  3.469   1.863   
 Residual             62.807   7.925   
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   57.5960     0.1329  375.6989  433.36   <2e-16 ***
ses            3.8739     0.1366 3914.6382   28.35   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
ses -0.025

Level 1 Predictor with Varying Intercepts, Varying Slopes

As shown in the earlier plot, it’s quite reasonable to assume that the effect of SES may vary between schools. Let’s test it:

GLMM.L1SESV <- lmer(math ~ ses + (ses | schcode), data = df)

A Brief Detour on “Implicit” Intercepts

Note that a 1 denotes an intercept, and there is an “implicit” or “default” 1 outside the parentheses and inside the parentheses. This is equivalent to the prior formula syntax but is more “explicit”.

GLMM.L1SESV <- lmer(math ~ 1 + ses + (1 + ses | schcode), data = df)
summary(GLMM.L1SESV)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + ses + (1 + ses | schcode)
   Data: df

REML criterion at convergence: 48190.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8578 -0.5553  0.1290  0.6437  5.7098 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 schcode  (Intercept)  3.2042  1.7900        
          ses          0.7794  0.8828   -1.00
 Residual             62.5855  7.9111        
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   57.6959     0.1315  378.6378  438.78   <2e-16 ***
ses            3.9602     0.1408 1450.7730   28.12   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
ses -0.284
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

If you wanted to remove the intercept for your fixed effects and the intercept for your random effects (not recommended), you would replace these “default” 1s with 0s. You can see below that there is no longer an “(Intercept)” under random effects or fixed effects.

GLMM.L1SESV <- lmer(math ~ 0 + ses + (0 + ses | schcode), data = df)
summary(GLMM.L1SESV)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 0 + ses + (0 + ses | schcode)
   Data: df

REML criterion at convergence: 74122.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8263  0.5688  0.9386  1.1965  2.3970 

Random effects:
 Groups   Name Variance Std.Dev.
 schcode  ses  1244     35.27   
 Residual      2566     50.66   
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
    Estimate Std. Error      df t value Pr(>|t|)  
ses    3.882      1.943 422.105   1.998   0.0464 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A Brief Detour on Singular Fit

Fitting this model with a random slope for SES, you may notice the warning, “boundary (singular) fit: see help(‘isSingular’)”. This occurs when one of your random effects parameters (as seen in your VCOV matrix) is essentially near 0, which can be due to the parameter being actually approximately 0 or because of multicollinearity.

GLMM.L1SESV <- lmer(math ~ ses + ( ses | schcode), data = df)

Let’s inspect the variance-covariance matrix and the random effects. The VCOV shows no variances near 0. But the random effects correlation shows a -1.0 correlation.

Matrix::bdiag(VarCorr(GLMM.L1SESV))
2 x 2 sparse Matrix of class "dgCMatrix"
            (Intercept)        ses
(Intercept)    3.204184 -1.5802590
ses           -1.580259  0.7793617
summary(GLMM.L1SESV)$varcor
 Groups   Name        Std.Dev. Corr  
 schcode  (Intercept) 1.79002        
          ses         0.88281  -1.000
 Residual             7.91110        

What should we inspect if we observe a “singular fit”?

We can attempt to fix the singular fit by removing this perfect correlation. Specifying the random intercept for schools (1 | schcode) and the random slope of SES for schools as separate (0 + SES | schcode) removes the correlation. Voila! The perfect correlation is gone and singularity is fixed!

GLMM.L1SESV <- lmer(math ~ ses + ( 1 | schcode) + (0 + ses | schcode), data = df)
summary(GLMM.L1SESV)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode) + (0 + ses | schcode)
   Data: df

REML criterion at convergence: 48213.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7791 -0.5526  0.1327  0.6466  5.7089 

Random effects:
 Groups    Name        Variance Std.Dev.
 schcode   (Intercept)  3.3222  1.8227  
 schcode.1 ses          0.7205  0.8488  
 Residual              62.5213  7.9070  
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  57.5888     0.1328 374.9738  433.55   <2e-16 ***
ses           3.8803     0.1435 377.2408   27.04   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
ses -0.023

Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)

Level 2: \(β_{0j}=γ_{00}+γ_{01}public_{j}+U_{0j}\) & \(β_{1j}=γ_{10} + U_{1j}\)

Combined: \(math_{ij}=γ_{00}+γ_{10}public_{j}+γ_{10}SES_{ij}+U_{0j}+U_{1j}SES_{1j}+R_{ij}\)

  1. the fixed effect for the intercept, controlling for ses; \(γ_{00}\)

  2. the fixed effect for the slope of ses; \(γ_{10}\)

  3. a random effect capturing the variance of students around their school’s mean math achievement, controlling for ses; \(\sigma^{2}\)

  4. a random effect for the intercept capturing the variance of schools around the intercept, controlling for ses; \(\tau^{2}_{0}\)

  5. a random effect for the slope capturing variance of school slopes around the grand mean slope, controlling for ses; \(\tau^{2}_{1}\)

  6. a random effect covariance capturing how the intercept variance and slope variance relate to each other; \(\tau_{01}\)

summary(GLMM.L1SESV)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ ses + (1 | schcode) + (0 + ses | schcode)
   Data: df

REML criterion at convergence: 48213.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7791 -0.5526  0.1327  0.6466  5.7089 

Random effects:
 Groups    Name        Variance Std.Dev.
 schcode   (Intercept)  3.3222  1.8227  
 schcode.1 ses          0.7205  0.8488  
 Residual              62.5213  7.9070  
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  57.5888     0.1328 374.9738  433.55   <2e-16 ***
ses           3.8803     0.1435 377.2408   27.04   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
ses -0.023

Modeling a L2 Predictor

In a multilevel model, we don’t have to only model predictors that vary across level-1 units, but can also model predictors that are the same across level-1 units but differ across level-2 (or level-3, or level-4 units). Here, that would be depicted as school-level units. Let’s focus on the categorical variable of whether the schools were public or private.

GLMM.L1SESV.L2PUB <- lmer(math ~ 1 + ses + public + (ses|schcode), data = df)

Level 1: \(math_{ij}=β_{0j}+B_{1j}SES_{1j}+R_{ij}\)

Level 2: \(β_{0j}=γ_{00}+γ_{01}public_{j}+U_{0j}\) & \(β_{1j}=γ_{10}+U_{1j}\)

Combined: \(math_{ij}=γ_{00}+γ_{10}public_{j}+γ_{10}SES_{ij}+U_{0j}+U_{1jSESj}+R_{ij}\)

Note: You may see \(public_{j}\) only predicts the level-2 intercept but not slope. This is intentional and only models a main effect of the level-2 predictor. Modeling the level-2 predictor for the level-2 slope would result in a cross-level interaction.

  1. the fixed effect for the intercept, controlling for ses and public; \(γ_{00}\)

  2. the fixed effect for the slope of public controlling for ses; \(γ_{01}\)

  3. the fixed effect for the slope of ses controlling for public; \(γ_{10}\)

  4. a random effect for the intercept capturing the variance of schools around the intercept, controlling for ses and public; \(\sigma^{2}\)

  5. a random effect capturing the variance of students around their school mean math achievement, controlling for ses and public; \(\tau^{2}_{0}\)

  6. a random effect for the slope capturing variance of school slopes around the grand mean slope, controlling for ses; \(\tau^{2}_{1}\)

  7. a random effect covariance capturing how the intercept variance and slope variance relate to each other; \(\tau_{01}\)

summary(GLMM.L1SESV.L2PUB)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: math ~ 1 + ses + public + (ses | schcode)
   Data: df

REML criterion at convergence: 48190.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8542 -0.5543  0.1299  0.6421  5.7122 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 schcode  (Intercept)  3.2147  1.793         
          ses          0.7904  0.889    -1.00
 Residual             62.5840  7.911         
Number of obs: 6871, groups:  schcode, 419

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)   57.8209     0.2469  395.1186 234.162   <2e-16 ***
ses            3.9631     0.1410 1433.2714  28.102   <2e-16 ***
public        -0.1699     0.2855  391.3491  -0.595    0.552    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr) ses   
ses    -0.122       
public -0.846 -0.036
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Distinguishing Random Factors and Random Effects

  • A factor is a variable.

  • An effect is the variable’s coefficient.

Full formula: \(math_{ij}=γ_{00}+γ_{1j}ses_{ij}+U_{0j}+U_{1j}ses_{ij}+R_{ij}\)

Where we have two fixed effects: \(γ_{00}\) and \(γ_{1j}\)

We have three random effects: random-intercept \(U_{0j}\), random-slope of SES \(U_{1j}ses_{ij}\) and level-1 variance \(R_{ij}\)

And one random factor: School, which is not seen directly in the formla but is denoted by subscript “i”. That’s because the fixed terms average over all the schools, but the random terms are per school.

Moving on…

What if we want to know how school type affects the slope of ses, though? In other words, is there a difference in the effect of SES on math achievement in a private or public school?

Cross-Level Interaction

This is one of the biggest advantages of using MLMs/MEMs is that you can estimate cross-level interactions between variables at higher-level units and variables at lower-level units. This would be reflected by predicting SES slopes (L1: Math ~ SES) by school type (L2: SES slope ~ Public)

GLMM.CL <- lmer(math ~ 1 + ses * public + (ses|schcode), data = df)

The parameters in this model are identical the random-intercepts and random-slopes model estimated prior, except with an additional parameter:

  1. the fixed effect for the effect of public on the slope of ses; \(γ_{11}\)

Why is the cross-level interaction modeled as the slope of SES predicted by public? Let’s do the math.

Level 1: \(math_{ij}=β_{0j}+β_{1j}ses_{ij}+R_{ij}\)

Level 2 Slope: \(β_{1j}=γ_{10}+γ_{11publicj}+U_{1j}\)

\(math_{ij}=β_{0j}+XXXXXses_{ij}+R_{ij}\)

\(math_{ij}=β_{0j}+(γ_{10}+γ_{11publicj}+U_{1j})ses_{ij}+R_{ij}\)

Combined: \(math_{ij}=β_{0j}+γ_{10}ses_{ij}+γ_{11}ses_{ij}*public_{j}+U_{1j}ses_{ij}+R_{ij}\)

Now, we have slopes predicted by the level-2 variable, which estimates a cross-level interaction.

Crossed and Nested Random Effects

Nested Random Effects

Every lower-level unit is unique to each higher-level unit

e.g. Different classes for each school (or different students for each class)

enter image description here

Nested Random Effects Syntax

# (1|School/Class)

Crossed Random Effects

Every lower-level unit appears in every higher-level unit.

e.g. Same classes for each school. (Intuitive psychology example: Same stimuli for each participant)

Crossed Random Effects Syntax

# (1|School) + (1|Class)

Example of Crossed Random Effects: Stimuli and Participants

head(sim_dat)
# A tibble: 6 × 5
  subj_id item_id category   X_i    RT
    <int>   <int> <chr>    <dbl> <dbl>
1      39       1 ingroup   -0.5  497.
2      39       2 ingroup   -0.5  574.
3      39       3 ingroup   -0.5  512.
4      39       4 ingroup   -0.5  235.
5      39       5 ingroup   -0.5  848.
6      39       6 ingroup   -0.5  131.

Implement a Crossed Random Effects Model

By-Participant and By-Stimuli Random-Intercepts and By-Participant Random-Slopes for \(X_{i}\) on reaction time

mod_sim <- lmer(RT ~ 1 + X_i + (1 | item_id) + (1 + X_i | subj_id),
                data = sim_dat)

summary(mod_sim)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: RT ~ 1 + X_i + (1 | item_id) + (1 + X_i | subj_id)
   Data: sim_dat

REML criterion at convergence: 67724.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7952 -0.6419  0.0011  0.6540  3.4639 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subj_id  (Intercept) 12645    112.45       
          X_i          1260     35.49   0.37
 item_id  (Intercept)  5960     77.20       
 Residual             41024    202.54       
Number of obs: 5000, groups:  subj_id, 100; item_id, 50

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)   816.65      15.93 123.63  51.256  < 2e-16 ***
X_i            82.77      22.85  50.21   3.622  0.00068 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
X_i 0.041 

Miscellaneous Advanced Topics

Partial Pooling Improves Predictions

  • Pulls more extreme estimates towards an overall average

  • Information from all groups can be used to inform parameter estimates for each individual group

Prediction, Pooling, and Shrinkage - Conductrics

Gives more “weight” to groups with more data…

Effect Sizes in Mixed Models

  • Conditional R2: takes both the fixed and random effects into account.

  • Marginal R2: considers only the variance of the fixed effects.

Useful Packages for Effect Sizes for Mixed Models

#library(r2glmm)
#library(r2beta)
#library(r2mlm)

Convergence Issues

Sometimes optimizers cannot always find the combination of parameters that maximizes the likelihood of observing your data. When optimizers cannot find a solution, this is called non-convergence.

  1. Number of iterations. If you increase the number of iterations, the algorithm will search for longer. This is the equivalent of getting our puzzle-doer to sit at the table for longer trying to assemble the puzzle, trying out different and more pieces.

  2. Algorithm: the algorithm determines how the optimizer chooses its next attempted solution. What strategy is our puzzle-doer using to fit pieces into the puzzle?

  3. Tolerance: this can get a bit technical and vary depending on context, so we suggest Brauer and Curtin, 2018 for more. But in our case, we can think of it as the algorithm’s tolerance for differences in solutions. Lower tolerance means slightly different solutions will be seen as different, whereas higher tolerance means two different solutions that are still kind of close will be treated as essentially the same. Maybe our puzzle-doer needs glasses; tolerance is like whether they’re wearing their glasses and can distinguish between two close-but-not-identical assembled puzzles.

Try different optimizers

# Iterate through a set of optimizers, report convergence results
diff_optims <- allFit(GLMM.CL, maxfun = 1e5)
bobyqa : [OK]
Nelder_Mead : [OK]
nlminbwrap : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]

Check for convergence flags

diff_optims_OK <- diff_optims[sapply(diff_optims, is, "merMod")]
lapply(diff_optims_OK, function(x) x@optinfo$conv$lme4$messages)
$bobyqa
[1] "boundary (singular) fit: see help('isSingular')"

$Nelder_Mead
[1] "boundary (singular) fit: see help('isSingular')"

$nlminbwrap
[1] "boundary (singular) fit: see help('isSingular')"

$`optimx.L-BFGS-B`
[1] "boundary (singular) fit: see help('isSingular')"

$nloptwrap.NLOPT_LN_NELDERMEAD
[1] "boundary (singular) fit: see help('isSingular')"

$nloptwrap.NLOPT_LN_BOBYQA
[1] "boundary (singular) fit: see help('isSingular')"

In your lmer/glmer function, you would include an argument such as below to increase the number of iterations or algorithm.

# control=glmerControl(optimizer="bobyqa",
#                            optCtrl=list(maxfun=2e5)))

Power Analysis for Multilevel Models

simr: Simulation-Based Power Analysis in R

https://humburg.github.io/Power-Analysis/simr_power_analysis.html

Mathieu et al., (2012) Power Tool for Cross-Level Interactions

https://aguinis.shinyapps.io/ml_power/

Murayama et al., (2022) Summary Statistics Based Power Analysis

https://koumurayama.shinyapps.io/summary_statistics_based_power/

An Interactive Tutorial

http://mfviz.com/hierarchical-models/